Abstract
Background To date, no study has examined the effectiveness of social distancing, while controlling for social mobility and social distancing restrictions in the United States. We utilize the quasi-experimental setting created by the nationwide protests precipitated by George Floyd’s tragic death on May 25, 2020, to assess the causal impact of social distancing on the spread of SARS-CoV-2.
Methods Our sample period spans from January 22, 2020, to June 20, 2020, and consists of 474,422 county-days representing 3,142 counties from all 50 states and the District of Columbia. To assess the change in COVID-19 case counts following the protests, we employ a differences-in-differences estimation strategy in a multivariate setting, in which we control for social distancing restrictions and social mobility across counties. We also control for covariates that may influence SARS-CoV-2 transmission, and implement placebo tests using a Monte Carlo simulation.
Findings We document a country-wide increase of over 3·06 cases per day, per 100,000 population, following the onset of the protests (95%CI: 2·47–3·65), and a further increase of 1·73 cases per day, per 100,000 population, in the counties in which the protests took place (95%CI: 0·59–2·87). Relative to the week preceding the onset of the protests, this represents a 61·2% country-wide increase in COVID-19 cases, and a further 34·6% increase in the protest counties.
Interpretation Our study documents a significant increase in COVID-19 case counts in counties that experienced a protest, and we conclude that social distancing practices causally impact the spread of SARS-CoV-2. The observed effect cannot be explained by changes in social distancing restrictions and social mobility, and placebo tests rule out the possibility that this finding is attributable to chance.
Funding We acknowledge the financial support from the Smith School of Business Distinguished Faculty Fellowship at Queen’s University.
1 Introduction
The highly contagious novel coronavirus, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), responsible for coronavirus disease 2019 (COVID-19), emerged in December 2019 in Wuhan city, Hubei province, China. 1 The initial outbreak quickly evolved into a public health emergency of international concern, and by March 2020, the World Health Organization (WHO) characterized COVID-19 as a pandemic. 2 As of June 2020, COVID-19 has reached over 180 countries and regions, and the total number of confirmed cases has surpassed 10 million globally.3 When compared to other countries, COVID-19 has spread throughout the United States (U.S.) at an unparalleled rate, infecting over 2·5 million individuals and claiming over 125,000 lives. 4
Transmission of SARS-CoV-2 can occur through both indirect and direct modes, including person-to-person contact and the spread of respiratory droplets from infected individuals via coughing and sneezing. 5 Recent evidence estimates the average and median basic reproduction number (R0) of SARS-CoV-2 as 3·28 and 2·79, respectively. The R0 indicates the contagiousness and transmissibility of a virus, with an R0 greater than one implying that each infected individual spreads the virus to multiple individuals. Public health measures, designed in consideration of the virus’s specific transmission properties, have been implemented with the aim of reducing the R0 to a value less than one. As research has demonstrated that SARS-CoV-2 can travel across a minimum distance of 6 feet (2 meters), 6 social distancing, the maintenance of at least a 6 foot physical distance from others, has been introduced as an important public health measure. A variety of social distancing restrictions have been instituted across the U.S. ranging from statewide stay-at-home orders, to more focused policies including: non-essential business closures, large gathering bans, school closure mandates, and restaurant and bar limits. 7 Moreover, the U.S. federal government has granted individual states the authority to design their own COVID-mitigation strategy, therefore, the extent and type of social distancing policies adopted differs across states. 8
The widespread adoption of social distancing restrictions in various jurisdictions has created an opportunity to examine the effectiveness of social distancing measures in reducing the spread of SARS-CoV-2. In the U.S., research examining government-imposed restrictions found that social distancing measures were effective in reducing the doubling rate of COVID-19 among U.S. states, 9 as well as the daily growth rate of COVID-19 cases across counties, 7,10 with a lag period consistent with the 14-day incubation time of SARS-CoV-2. 11 This is consistent with early predictive models which suggest that the absence of social distancing measures would result in a greater spread of SARS-CoV-2. 12–14 However, recent evidence suggests that rather than reducing the number of daily confirmed cases, social distancing merely stabilizes the spread of COVID-19. 9 The lack of consensus in the literature regarding the effectiveness of social distancing measures stresses the necessity for a study to explore the causal impact of these measures on the SARS-CoV-2 infection rate.
Research has demonstrated that greater population mobility influences the R0 of SARS-CoV-2, and facilitates the spread of COVID-19 across different geographic areas. 15 Given the relationship between mobility and R0, several studies have used mobility data as a measure of social distancing when examining the effectiveness of social distancing in reducing the spread of SARS-CoV-2. 7,16–18 However, social mobility measures represent an imperfect proxy for social distancing, because individuals can be mobile while still maintaining the recommended 6 foot interpersonal separation to prevent viral transmission. Therefore, future studies should control for mobility in order to identify the direct relationship between social distancing and the SARS-CoV-2 infection rate.
Social distancing practices were abruptly relaxed during the mass protests precipitated by the tragic death of George Floyd in Minneapolis, MN, on May 25, 2020. During these protests, thousands of people across the U.S. congregated, potentially increasing their exposure to SARS-CoV-2. The unpredictable nature of the protests creates a natural experimental setting to investigate the causal impact of social distancing on the SARS-CoV-2 infection rate. Two key requirements for the identification of the causal link between social distancing and the spread of SARS-CoV-2 are satisfied in this setting, namely: 1) the existence of a strong theoretical basis supporting the relationship in question and, 2) exogenous variation in the variable of interest, i.e. social distancing. 19 The latter is key to establish causality, because it mitigates concerns that omitted variables correlated with both the protests and the spread of SARS-CoV-2 might be driving our findings. This experimental setting also enables us to circumvent common concerns about endogeneity and self-selection which besets most non-randomized-trial experiments.20
To assess the causal impact of social distancing on the SARS-CoV-2 infection rate, we implement our empirical analysis in a differences-in-differences (DID) setting, in which the onset of the protests represents the treatment effect and the counties in which protests take place represent the treatment group. This paper differs from its predecessors in that rather than investigating the effectiveness of social distancing following the imposition of social distancing restrictions, it examines their effectiveness as social distancing practices are abruptly relaxed. Furthermore, this study controls explicitly for social distancing restrictions imposed by states in the period surrounding the protests, as well as for the concurrent increase in social mobility. Establishing the effectiveness of social distancing practices in a statistically reliable way has important public health implications, as states are in the midst of relaxing the social distancing restrictions initially imposed in March 2020.
2 Methods
2.1 Data and Sample Description
We source our U.S. COVID-19 data from the John Hopkins GitHub repository. This data consists of confirmed cases in each county at the end of every day since the start of the outbreak in late January 2020. We calculate the number of new cases for each county and each day by subtracting the cumulative number of confirmed cases at the end of the day from the number of cumulative cases from the previous day. This sampling procedure yields a panel data-set consisting of a total of 474,422 county-days representing 3,142 counties from all fifty states, as well as the District of Columbia (DC), for the period starting on January 22, 2020, and ending on June 20, 2020. We describe our sample in Table I, and in Figure 1 we show the counties in which protests took place according to media reports, along with the size of the first protest taking place within each county. We obtain our county-level population data and our county-level demographic data from the U.S. Census Bureau. We extract our county-level Gross Domestic Product (GDP) data from the U.S. Bureau of Economic Analysis’ (BEA) Regional Economic Accounts database (Table CAGDP1). We retrieve county-level data on the prevalence of obesity, diabetes, smoking, and hypertension from the University of Washington’s Institute for Health Metrics and Evaluation (IHME). The hypertension and obesity data are for the years 2009 and 2011, respectively, and the diabetes and smoking prevalence data are for 2012. The IHME reports hypertension and obesity data for females and males separately, so we construct a population-weighted average measure for these two covariates based on the proportion of females and males in each county, as reported by the U.S. Census Bureau.
The social distancing restrictions data is from the University of Washington’s State-Level Social Distancing Policies in Response to the 2019 Novel Coronavirus in the U.S. repository. The social distancing restrictions include: 1) restrictions on public gatherings exceeding 5, 10, 25, 50, 100, 250, 500, or 1,000 people, 2) limits on restaurant operations, 3) closure of specific businesses, e.g. fitness centres, gyms, casinos, etc., 4) closure of non-essential businesses, 5) stay-at-home orders for non-essential activities, 6) state curfews on non-essential activities, 7) mandated quarantines for people entering the state, 8) travel restrictions prohibiting residents from leaving the state, non-residents from entering the state, or residents from travelling across counties within the state, 9) self-isolation requirement for individuals with confirmed COVID-19 infection, and 10) mandatory wearing of masks or other mouth and nose coverings in public places. We construct our social distancing restrictions index by adding the number of restrictions that are in place in a state on any given day, based on the date at which each restriction is enacted, relaxed, or expired. Figure 2 shows the evolution of our index for randomly selected states.
We obtain our mobility data from the Descartes Labs. This data consists of mobility indexes calculated at the end of every day and aggregated at the county level. The indexes, which we will refer to as the social mobility index, are based on geolocation reports from smartphones and other mobile devices, and track the movements of individual mobile phone subscribers. The methodology employed to construct these indexes is described in Warren et al., 2020. 21 The mobility index data is available at a daily frequency from March 1, 2020, until the end of our sample period. Thus, we lose a total of 122,538 county-day observations from the start of our sample period up until February 29, 2020, in all our regression analyses featuring this data. Figure 3 shows the mobility index for a randomly selected small and large county in the states of New York and Texas.
Finally, we construct a comprehensive list of protests that took place across the U.S. Our starting point is the List of George Floyd protests in the United States assembled by Wikipedia. At the time of writing, the main Wikipedia page cited 134 news articles from national, regional, and local media outlets, and the secondary pages cited hundreds more. From these media citations, we extracted the location and the date at which the protests reportedly took place, as well as the estimated number of individuals involved in each protest. We complement this process with a search on the Dow Jones Factiva database.
2.2 Regression Specification
We examine the impact of the abrupt relaxation of social distancing practices, which occurred during the U.S. nationwide protests, on the SARS-CoV-2 infection rate with an Ordinary Least Squares (OLS) differences-in-differences (DID) panel regression equation, which is specified as follows: where CIi,j,t corresponds to new confirmed SARS-CoV-2 infections in county i from state j on day t, per 100,000 population. Protesti is an indicator variable which is set equal to one if a protest took place in county i, and to zero otherwise. PostGF i,j,t is an indicator variable set equal to zero from the first day of our sample period up until May 25, 2020, the day of George Floyd’s tragic death, and to one on every subsequent date. Protest × PostGF is an indicator variable which captures the interaction between Protesti and PostGF i,j,t. Xi,j,t and Yj,t are vectors of county and state characteristics which we use as control variables, and (γi) represents state-level fixed effects to control for time-invariant differences across states in our regressions.
In equation (1), β1 captures any differences that may exist between the SARS-CoV-2 infection rate in protest and non-protest counties, and that are unrelated to the protests. We expect this coefficient to be statistically indistinguishable from zero. β2 captures the impact of the relaxation of social distancing practices on the infection rate across all U.S. counties, following the onset of the protests. β3 captures the incremental impact of the relaxation of social distancing practices on the infection rate, specifically in counties where protests took place. Under the null hypothesis that the relaxation of social distancing practices has no causal impact on the SARS-CoV-2 infection rate, both β2 and β3 coefficients should be statistically indistinguishable from zero. In all our regressions, we cluster the standard errors at the county level to account for any potential cross-sectional dependence in the error terms, ϵi,j,t.22 We perform our statistical analysis with STATA 16 and use Sergio Correia’s REGHDFE command to estimate equation (1).23
3 Covariates
In our differences-in-differences regressions, we include control variables which may influence the transmission rate of SARS-CoV-2. These control variables account for demographic, health, geographic, and income level variations across counties. For demographic indicators, we include male sex and age (60 years+) since these factors are associated with both an increased risk of testing positive for SARS-CoV-2 and greater illness severity.24 We also include ethnicity as a demographic variable to account for the increased risk of a positive SARS-COV-2 test observed among Blacks and Hispanics. Obesity, diabetes, and hypertension are clinical risk factors included as health covariates in the regressions, as they are associated with an increased risk of severe illness, and a greater risk of mortality from COVID-19. 25 We also include smoking as a clinical risk factor, as some evidence suggests that smoking may be associated with an increased severity of COVID-19. 26 We include population density among our control variables, as higher rates of SARS-CoV-2 infections are observed in more densely populated, urban areas. 15,25 Consistent with previous research showing that residents from more economically deprived areas are more likely to test positive for SARS-COV-2, we use real GDP per capita to control for income in our regressions. 25
4 Results
4.1 Impact of Protests on SARS-CoV-2 Infections
We report results from regression equation (1) in Table III. In Model (1), the coefficients associated with Protest is equal to 1·22 (95%CI: 0·79–1·65) and is highly significant. This implies that, over the entire sample period, the SARS-CoV-2 infection rate is 1·22 cases per day, per 100,000 population higher in the counties where protests took place, relative to the counties where no protests took place. The coefficient associated with PostGF is positive and highly significant, implying that the SARS-CoV-2 infection rate increases by 3·39 cases per day, per 100,000 across the U.S. following the onset of the protests. Finally, the coefficient associated with the Protest × PostGF interaction indicates that the infection rate is even greater in the counties in which protests actually took place, following the onset of the protests (4·01; 95%CI: 3·24–4·78). To put this number into perspective, recall that the average number of new case infections across all counties is equal to 5 per day, per 100,000 population, in the week preceding the onset of the protests (see Column (2) of Table I). Using this number as a reference point, COVID-19 cases increase by a further 80·2%, on average, in protest-counties, relative to non-protest counties, and by 3·39 + 4·01 = 7·40 cases per day, per 100,000 population, or 148% overall. Models (2)-(6) of Table III provide evidence that is consistent with Model (1). The coefficient associated with Protest loses its statistical significance in Models (2) and (6), suggesting that the higher overall infection rate of protest-counties is attributable to cross-county differences in demography. We note that the coefficient associated with PostGF is very stable across the six models, ranging between 3·33 and 3·39. Likewise, the coefficient associated with our Protest × PostGF interaction is quite stable across the six models, ranging between 2·80 in Model (6) and 4·01 in Model (1). Although there is no telling which one of these six models provides a better description of the causal impact of relaxing social distancing practices on the spread of SARS-CoV-2, out of conservatism, we will employ our omnibus regression Model (6) from this point on, referring to this specification as our baseline regression model.
Based on the evidence reported in Table III, we can safely reject the null hypothesis that relaxing social distancing practices has no impact on the spread of SARS-CoV-2. Furthermore, as our placebo tests (Sub-section 4.3) indicate, we can rule out the possibility that this important finding is attributable to chance with a high degree of confidence.
4.2 Social Distancing Restriction and Mobility
In the period preceding the onset of the protests, the number of new COVID-19 cases began to drop steadily across the country.3 Accordingly, several states began to unwind their social distancing restrictions in a carefully staged manner. Figure (2) illustrates this trend in Alabama, California, Florida, and New York, for instance. Starting in mid-March, we observe a steady rise in our social distancing restrictions index in these four states and we observe the start of a slow unwind by mid-April. Notably, while social distancing restrictions were being relaxed across the nation, social mobility was on the rise (see Figure 3). Consequently, it may very well be that the concurrent relaxation of social distancing restrictions and the increase in social mobility during the event period has prompted individuals to relax their social distancing practices, and that the effect that we document in Table III is partly contaminated by these contemporaneous changes. We address this issue in Table IV, where we include our social distancing restrictions and social mobility indexes in our baseline DID regression equation (1) as additional control variables.
Table IV includes four models. Model (1) corresponds to Model (6) from Table III, Model (2) includes the additional control for social distancing restrictions, Model (3) includes the additional control for social mobility, and Model (4) includes both controls. In Model (2), we see a drop from 3·33 to 3·01 in the coefficient associated with PostGF, relative to Model (1), but the coefficient remains highly significant. We observe a similar drop in the coefficient associated with the Protest× PostGF interaction, from 2·80 to 2·29, with no drop in its statistical significance.
In Model (3), controlling for social mobility has a slightly larger impact on the coefficients associated with PostGF and Protest × PostGF. The first coefficient drops from 3·33 to 2·97, while the Protest × PostGF drops from 2·80 to 1·81. Evidently, social distancing restrictions and social mobility are correlated with one another. For instance, we should expect social mobility to rise when travel restrictions are lifted. When we control for both factors in Model (4), the coefficient associated with PostGF is equal to 3·06 (95%CI: 2·47–3·65), which is highly significant, and the coefficient associated with Protest×PostGF is equal to 1·73 (95%CI: 0·59–2·87), also highly significant.
In summary, after controlling for the reduction in social distancing restrictions and the increase in social mobility that occurred following the onset of the protests, we still observe a significant increase in the number of daily COVID-19 cases across all counties (61·2% relative to the week preceding the event), and a further increase of 1·73 cases (34·6%) in the counties where protests took place. We attribute the latter to the relaxation of social distancing practices during the protests. This interpretation is supported by the abundance of video footage demonstrating that the mass protests brought people into close physical proximity to one another, in contravention to social distancing restrictions that were in place at the time.
4.3 Placebo Test
In Table V, we report the results of a placebo test assessing whether the causal impact of the protests on the spread of SARS-CoV-2 that we document in Tables III and IV can be attributed to chance. For this purpose, we implement a Monte Carlo simulation exercise centered on our baseline DID panel regression specification (1), i.e. Model (4) of Table IV. In this simulation, we pick a random date between February 6, 2020, and June 1, 2020, to represent the onset of the protests and we assign counties to the protest group randomly, in proportion to the actual fraction of counties that took part in the protests (18%). We carry out this exercise 10,000 times, and each time, we estimate our model with the simulated protest onset date and protest county pair, and collect the key parameter estimates from the regression, i.e. Protest, PostGF, and Protest × PostGF, along with their county-cluster robust t-statistics.
In this simulation, the impact of the random date on the randomly assigned protest counties on the SARS-CoV-2 infection rate is negligible, on average. Only 25% of the coefficients from the simulation are positive, and at most 1% of them are statistically significant. Furthermore, the coefficients associated with PostGF and Protest × PostGF from the actual regressions (Panel B), i.e. 3·06 and 1·73, are well above the 95% confidence thresholds inferred from the simulated distribution, i.e. 1·35 and 1·37, respectively. Indeed, our baseline regression coefficients fall at the very top end of the simulated distribution. This implies that we can safely reject the null hypothesis that the causal impact of protests on the SARS-CoV-2 infection rate that we document is due to pure chance, with at least 99% confidence.
5 Discussion
In this paper, we employ the natural experimental setting created by the U.S. protests precipitated by George Floyd’s tragic death to document the causal impact of social distancing measures on the spread of SARS-CoV-2. Using a DID analysis, in which the treatment effect corresponds to the onset of the protests and the treatment group corresponds to the counties in which protests reportedly took place, we document a country-wide increase of more than 3·06 cases per day, per 100,000 population, following the onset of the protests, and a further increase of 1·73 cases per day, per 100,000 population, in the counties in which the protests took place. Relative to the average number of new cases per day during the week preceding the onset of the protests, this represents a 61·2% country-wide increase in COVID-19 cases, and further 34·6% increase in the protest counties.
The increase in the SARS-CoV-2 infection rate that we document in this study cannot be explained by the relaxation of state-imposed social distancing restrictions in the period surrounding the protests, nor by the concurrent increase in social mobility during the protest period, as we control explicitly for these two factors in our regressions. Therefore, it stands that the increase in SARS-CoV-2 infections that we observe following the onset of the protests can be attributed to the relaxation of social distancing practices. The causal impact is also robust to both the inclusion of a host of covariates that are known to influence the SARS-CoV-2 infection rate, and to placebo tests that enable us to rule out the possibility that our findings are attributable to chance.
Our study is not without limitations. In particular, over 70 testing centers across the U.S. were closed following the onset of the protests. Therefore, the increase in the SARS-CoV-2 infections that we document likely underestimates the true increase. We are also unable to assess protest participants’ vulnerability (e.g. age, underlying health conditions, personal protective wear, etc.), and variability along these dimensions may influence the risk of SARS-CoV-2 infection. Additionally, we cannot control for the actual degree of physical proximity between participants, which would impact the transmission rate of SARS-CoV-2 during the protests. Moreover, we rely on the accuracy of media reports to identify the counties in which protests took place. Finally, we do not account for the magnitude of the protests in each county, however, expressing the case counts in rates rather than in levels should minimize any potential scale-related effects.
Future research using this experimental setting could use machine learning tools to analyze protest videos and determine the relative contribution of participant demographics, the degree of physical distancing, the extent and type of personal protective wear on the spread of SARS-CoV-2. Taken together, this study demonstrates that, when controlling for social mobility and restrictions, social distancing practices causally impact the spread of SARS-CoV-2. As states are in the midst of relaxing the social distancing restrictions initially imposed in March 2020, establishing the effectiveness of social distancing practices in a statistically reliable way has important public health implications. Our research informs policy makers and provides insights regarding the usefulness of social distancing as an intervention to minimize the spread of SARS-CoV-2, and reduce the risk of a second, and possibly, third wave of COVID-19.
Data Availability
This study uses publicly accessible data exclusively.
https://github.com/descarteslabs/DL-COVID-19
https://github.com/COVID19StatePolicy/SocialDistancing
http://ghdx.healthdata.org/record/ihme-data/united-states-hypertension-estimates-county-2001-2009
http://ghdx.healthdata.org/record/ihme-data/united-states-smoking-prevalence-county-1996-2012
http://ghdx.healthdata.org/record/ihme-data/united-states-diabetes-prevalence-county-1999-2012
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
7 Funding
LG acknowledges the financial support from the Smith School of Business Distinguished Faculty Fellowship at Queen’s University.
8 Author Contributions
LG conceived the study, and all authors contributed to the final study design. LG performed the data analysis, created the tables and figures, and wrote the methods and results sections in the initial draft of the manuscript. SG and JL conducted the literature search, and assisted LG with the data collection. All authors contributed substantially to the interpretation of the data, and equally to the write up. All authors wrote and approved of the final manuscript submission.
9 Declaration of Interests
Authors declare no competing interests.
10 Data Sharing
This study uses publicly accessible data exclusively.
6 Acknowledgments
We wish to express our sincere thanks to the Descartes Labs for making their mobility data available to us.